Search CORE

22 research outputs found

On Use of Task Independent Training Data in Tandem Feature Extraction

Author: Hermansky Hynek
Sivadas Sunil
Publication venue: Martigny, Switzerland, IDIAP
Publication date: 10/03/2006
Field of study

The problem we address in this paper is, whether the feature extraction module trained on large amounts of task independent data, can improve the performance of stochastic models? We show that when there is only a small amount of task specific training data available, tandem features trained on task independent data give considerable improvement over Perceptual Linear Prediction (PLP) cepstral features in Hidden Markov Model (HMM) based speech recognition systems

Infoscience - École polytechnique fédérale de Lausanne

Using RASTA in task independent TANDEM feature extraction

Author: Aradilla Guillermo
Dines John
Sivadas Sunil
Publication venue: Martigny, Switzerland
Publication date: 10/03/2006
Field of study

In this work, we investigate the use of RASTA filter in the TANDEM feature extraction method when trained with a task independent data. RASTA filter removes the linear distortion introduced by the communication channel which is demonstrated in a 18\% relative improvement on the Numbers 95 task. Also, studies yielded a relative improvement of 35\% over the basic PLP features by combining TANDEM features and conventional PLP features

Infoscience - École polytechnique fédérale de Lausanne

Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

Author: Leung Cheung Chi
Ma Bin
Ni Chongjia
Sivadas Sunil
Tong Rong
Wang Lei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/10/2022
Field of study

This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems.Comment: Published by the 2017 IEEE International Conference on Orange Technologies (ICOT 2017

arXiv.org e-Print Archive

Multi-resolution Spectral Entropy Based Feature for Robust ASR

Author: Bourlard Hervé
Ikbal Shajith
Misra Hemant
Sivadas Sunil
Publication venue: Philadelphia, U.S.A.
Publication date: 10/03/2006
Field of study

Recently, entropy measures at different stages of recognition have been used in automatic speech recognition (ASR) task. In a recent paper, we proposed that formant positions of a spectrum can be captured by multi-resolution spectral entropy feature. In this paper, we suggest modifications to the spectral entropy feature extraction approach and compute entropy contribution from each sub-band to the total entropy of the normalized spectrum. Further, we explore the ideas of overlapping sub-bands and the time derivatives of the spectral entropy feature. The modified feature is robust to additive wide-band noise and performs well at low SNRs. In the last, in the frame work of TANDEM, we show that the system using combined entropy and PLP features works better than the baseline PLP feature for additive wide-band noise at different SNRs

Infoscience - École polytechnique fédérale de Lausanne

Entropy Based Combination of Tandem Representations for Noise Robust ASR

Author: Bourlard Hervé
Hermansky Hynek
Ikbal Shajith
Misra Hemant
Sivadas Sunil
Publication venue: Jeju Island, Korea
Publication date: 10/03/2006
Field of study

In this paper, we present an entropy based method to combine tandem representations of the recently proposed Phase AutoCorrelation (PAC) based features and Mel-Frequency Cepstral Coefficients (MFCC) features. PAC based features, derived from a nonlinear transformation of autocorrelation coefficients and shown to be noise robust, improve their robustness to additive noise in their tandem representation. On the other hand, MFCC features in their tandem representation show a significant improvement in recognition performance on clean speech. An entropy based combination method investigated in this paper adaptively gives a higher weighting to the representation of MFCC features in clean speech and to the representation of PAC based features in noisy speech, thus yielding a robust recognition performance in all conditions

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

Pushing the Envelope—Aside

Author: Athineos Marios
Bourlard Hervé
Chen Barry
Doddington George
Ellis Daniel P. W.
Hermansky Hynek
Jain Pratibha
Morgan Nelson
Ostendorf Mari
Shinozaki Takahiro
Sivadas Sunil
Stolcke Andreas
Sönmez Kemal
Zhu Qifeng
Çetin Özgür
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Despite successes, there are still significant limitations to speech recognition performance, particularly for conversational speech and/or for speech with significant acoustic degradations from noise or reverberation. For this reason, authors have proposed methods that incorporate different (and larger) analysis windows, which are described in this article. Note in passing that we and many others have already taken advantage of processing techniques that incorporate information over long time ranges, for instance for normalization (by cepstral mean subtraction as stated in B. Atal (1974) or relative spectral analysis (RASTA) based in H. Hermansky and N. Morgan (1994)). They also have proposed features that are based on speech sound class posterior probabilities, which have good properties for both classification and stream combination

Columbia University Academic Commons